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CONSISTENCY OF MARKOV CHAIN QUASI-MONTE CARLO 
ON CONTINUOUS STATE SPACES 

By S. Chen 1 , J. Dick 2 and A. B. Owen 1 

Stanford University, University of New South Wales and 
Stanford University 

The random numbers driving Markov chain Monte Carlo (MCMC) 
simulation are usually modeled as independent U(0, 1) random vari- 
ables. Tribble [Markov chain Monte Carlo algorithms using com- 
pletely uniformly distributed driving sequences (2007) Stanford Univ.] 
reports substantial improvements when those random numbers are 
replaced by carefully balanced inputs from completely uniformly dis- 
tributed sequences. The previous theoretical justification for using 
anything other than i.i.d. U(0, 1) points shows consistency for esti- 
mated means, but only applies for discrete stationary distributions. 
We extend those results to some MCMC algorithms for continuous 
stationary distributions. The main motivation is the search for quasi- 
Monte Carlo versions of MCMC. As a side benefit, the results also 
establish consistency for the usual method of using pseudo-random 
numbers in place of random ones. 



1. Introduction. In Markov chain Monte Carlo (MCMC), one simulates 
a Markov chain and uses sample averages to estimate corresponding means 
of the stationary distribution of the chain. MCMC has become a staple 
tool in the physical sciences and in Bayesian statistics. When sampling the 
Markov chain, the transitions are driven by a stream of independent U (0, 1) 
random numbers. 

In this paper, we study what happens when the i.i.d. U (0, 1) random num- 
bers are replaced by deterministic sequences, or by some dependent U(0, 1) 
values. The motivation for replacing i.i.d. [7(0,1) points is that carefully 
stratified inputs may lead to more accurate sample averages. One must be 
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cautious though, because as with adaptive MCMC [3, 21], the resulting sim- 
ulated points do not have the Markov property. 

The utmost in stratification is provided by quasi-Monte Carlo (QMC) 
points. There were a couple of attempts at merging QMC into MCMC 
around 1970, and then again starting in the late 1990s. It is only recently that 
significant improvements have been reported in numerical investigations. For 
example, Tribble [43] reports variance reductions of several thousand fold 
and an apparent improved convergence rate for some Gibbs sampling prob- 
lems. Those results motivate our theoretical work. They are described more 
fully in the literature survey below. 

To describe our contribution, represent MCMC sampling via x$+i = 0(xj, 
Uj) for i = 1, . . . , n, where xo is a nonrandom starting point and Uj 6 (0, l) d . 
The points Xj belong to a state space Qcl s . The function <j> is chosen 
so that Xj form an ergodic Markov chain with the desired stationary dis- 
tribution 7r when Uj ~ U(0,l) d independently. For a bounded continuous 
function /: SI R, let 9(f) = J Q /(x)vr(x) (fx and 9 n (f) = (1/n) £™ =1 /(x 4 ). 
Then 9 n (f) — >p 9(f) as n — > oo. In this paper, we supply sufficient conditions 
on 4> and on the deterministic sequences Uj so that 9 n (f) — > 9(f) holds when 
those deterministic sequences are used instead of random ones. The main 
condition is that the components of Uj be taken from a completely uniformly 
distributed (CUD) sequence, as described below. 

Ours are the first results to prove that deterministic sampling applied 
to MCMC problems on continuous state spaces is consistent. In practice, 
of course, floating point computations take place on a large discrete state 
space. But invoking finite precision does not provide a satisfying description 
of continuous MCMC problems. In a finite state space argument, the result- 
ing state spaces are so big that vanishingly few states will ever be visited 
in a given simulation. Then if one switches from 32 to 64 to 128 bit repre- 
sentations, the problem seemingly requires vastly larger sample sizes, but in 
reality is not materially more difficult. 

To avoid using the finite state shortcut, we adopt a computational model 
with infinite precision. As a side benefit, this paper shows that the standard 
practice of replacing genuine i.i.d. values Uj by deterministic pseudo-random 
numbers is consistent for some problems with continuous state spaces. We 
do not think many people doubted this, but neither has it been established 
before, to our knowledge. It is already known from Roberts, Rosenthal and 
Schwartz [40] that, under certain conditions, a geometrically ergodic Markov 
chain remains so under small perturbations, such as rounding. That work 
does not address the replacement of random points by deterministic ones 
that we make here. 

1.1. Literature review. There have been a small number of prior at- 
tempts to apply QMC sampling to MCMC problems. The first appears to 
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have been Chentsov [7] , whose work appeared in 1967, followed by Sobol' [42] 
in 1974. Both papers assume that the Markov chain has a discrete state space 
and that the transitions are sampled by inversion. Unfortunately, QMC does 
not usually bring large performance improvements on such unsmooth prob- 
lems and inversion is not a very convenient method. 

Chentsov replaces i.i.d. samples by one long CUD sequence, and this is 
the method we will explain and then adapt to continuous problems. Sobol' 
uses what is conceptually an n x oo matrix of values from the unit interval. 
Each row is used to make transitions until the chain returns to its starting 
state. Then the sampling starts using the next row. It is like deterministic 
regenerative sampling. Sobol' shows that the error converges as 0(l/n) in 
the very special case where the transition probabilities are all rational num- 
bers with denominator a power of 2. These methods were not widely cited 
and, until recently, were almost forgotten, probably due to the difficulty 
of gaining large improvements in discrete problems, and the computational 
awkwardness of inversion as a transition mechanism for discrete state spaces. 

The next attempt that we found is that of Liao [27] in 1998. Liao takes 
a set of QMC points in [0, l] d shuffles them in random order, and uses them 
to drive an MCMC. He reports 4- to 25-fold efficiency improvements, but 
gives no theory. An analysis of Liao's method is given in [44]. Later, Chau- 
dary [6] tried a different strategy using QMC to generate balanced proposals 
for Metropolis-Hastings sampling, but found only small improvements and 
did not publish the work. Craiu and Lemieux [8] also consider multiple-try 
Metropolis and find variance reductions of up to 30%, which is still modest. 
Earlier, Lemieux and Sidorsky [26] report variance reduction factors ranging 
from about 1.5 to about 18 in some work using QMC in conjunction with 
the perfect sampling method of Propp and Wilson [38] . 

Only recently have there been significantly large benefits from the com- 
bination of QMC and MCMC. Those benefits have mainly arisen for prob- 
lems on continuous state spaces. Tribble's [43] best results come from Gibbs 
sampling problems computing posterior means. For problems with d param- 
eters, he used every d-tuple from a small custom built linear feedback shift 
register (LFSR). One example is the well-known model used by Gelfand 
and Smith [16] for failure events of 10 pumps from the article by Gaver 
and O'Murcheartaigh [15]. There are 11 unknown parameters, one for each 
pump and one for the scale parameter in the distribution of pump failure 
rates. A second example is a 42 parameter probit model for vasorestriction 
based on a famous data set from [14] and analyzed using latent variables as 
in Albert and Chib [1]. Of those 42 parameters, the 3 regression coefficients 
are of greatest interest and 39 latent variables are nuisance variables. Table 
1 sets out variance reduction factors found for randomized CUD versus i.i.d. 
sampling. The improvements appear to grow with n, and are evident at very 
small sample sizes. 
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Table 1 

Variance reduction factors from Tribble [43] for two Gibbs sampling problems. For the 
pumps data, the greatest and least variance reduction for a randomized CUD sequence 
versus i.i.d. sampling is shown. For the vasorestriction data, greatest and least variance 
reductions for the three regression parameters are shown. See [43] for simulation details 
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There is another line of research in which large improvements have been 
obtained by combining QMC with MCMC. This is the array-RQMC method 
described in L'Ecuyer, Lecot and Tuffin [24] and other articles. That method 
simulates numerous chains in parallel using quasi-Monte Carlo to update all 
the chains. It requires a complicated method to match the update variables 
for each step to the various evolving chains. This method has achieved vari- 
ance reductions of many thousand fold on some problems from queuing and 
finance. Very few properties have been established for it, beyond the case 
of heat particles in one dimension that was considered by Morokoff and 
Cafiisch [31]. 

Finally, Jim Propp's rotor-router method is a form of deterministic Markov 
chain sampling. It has brought large efficiency improvements for some prob- 
lems on a discrete state space and has been shown to converge at better 
than the Monte Carlo rate on some problems. See, for example, Doerr and 
Friedrich [13]. 

The use of CUD sequences that we study has one practical advantage 
compared to the rotor-router, array-RQMC, regenerative sampling, and the 
other methods. It only requires replacing the i.i.d. sequence used in a typical 
MCMC run by some other list of numbers. 

1.2. Outline. The paper is organized around our main results which ap- 
pear in Section 3. Theorem 2 gives sufficient conditions for consistency of 
QMC-MCMC sampling by Metropolis-Hastings. Theorem 3 gives sufficient 
conditions for consistency of QMC-MCMC sampling for the systematic scan 
Gibbs sampler. 

Section 2 contains necessary background and notation for the two main 
theorems of Section 3. It introduces quasi-Monte Carlo and Markov chain 
Monte Carlo giving key definitions we need in each case. That section 
presents the Rosenblatt-Chentsov transformation. We have combined a clas- 
sic sequential inversion method based on the Rosenblatt transformation with 
an elegant coupling argument that Chentsov [7] used. 
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The consistency results for Metropolis-Hastings (Theorem 2) make mod- 
erately strong assumptions in order to ensure that a coupling occurs. Sec- 
tion 4 shows that those assumptions are satisfied by some Metropolized inde- 
pendence samplers and also by some slice samplers. We also assumed some 
Riemann integrability properties for our MCMC proposals. The Riemann 
integral is awkward compared to the Lebesgue integral, but considering it 
is necessary when we want to study specific algorithms on deterministic in- 
puts. Section 5 gives sufficient conditions for an MCMC algorithm to satisfy 
the required Riemann integrability conditions. 

Our consistency results for the Gibbs sampler (Theorem 3) require some 
contraction properties and some Jordan measurability. Section 6 shows that 
these properties hold under reasonable conditions. Section 7 has a brief 
discussion on open versus closed intervals for uniform random numbers. 
Our conclusions are in Section 8. The lengthier or more technical proofs are 
placed in the Appendix. 

2. Background on QMC and MCMC. 

2.1. Notation. Our random vectors are denoted by x = (x±, . . . , x s ) G 
S]CR S for s > 1. Points in the unit cube [0, l] d are denoted by u = (u±, . . . , u^) 
Two points a, b G W 1 with aj < bj for j = 1, . . . , d define a rectangle YYj=i [ a j > 
bj], denoted by [a, b] for short. The indicator (or characteristic) function of 
a set A C M. d is written 1a- 

We assume the reader is familiar with the definition of the (proper) Rie- 
mann integral, for a bounded function on a finite rectangle [a, b] C M. d . The 
bounded set A C M. d is Jordan measurable if 1a is Riemann integrable on 
a bounded rectangle containing A. By Lebesgue's theorem (see Section 5) A 
is Jordan measurable if Ad(dA) = 0. Here denotes Lebesgue measure on 
M. d , and dA is the boundary of A, that is, the set on which 1a is discontin- 
uous. 

2.2. QMC background. Here, we give a short summary of quasi-Monte 
Carlo. Further information may be found in the monograph by Niedere- 
iter [34]. 

QMC is ordinarily used to approximate integrals over the unit cube [0, l] d , 
for de N. Let xi, . . . ,x„ G [0, l] d . The QMC estimate of 9(f) = L 1]d /(x) (fx 

is n (f) = ^Sr=i/( x *)> just as we would use in plain Monte Carlo. The 
difference is that in QMC, distinct points Xj are chosen deterministically to 
make the discrete probability distribution with an atom of size 1/n at each 
Xj close to the continuous U[0, l] d distribution. 
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The distance between these distributions is quantified by discrepancy 
measures. The local discrepancy of x^, . . . ,x n at a G [0, l] d is 

^ n d 

(1) 5(a) =(5(a;xi,...,x n ) = - ^ l[ , a )(xi) — a ^ 

n i=i j=i 

The star discrepancy of xi , . . . , x n in dimension d is 

(2) D? = D?(x 1 ,...,x n ) = sup |5(a; Xl ,...,x n )|. 

ae[0,l] d 

For d=l, the star discrepancy reduces to the Kolmogorov-Smirnov distance 
between a discrete and a continuous uniform distribution. 

A uniformly distributed sequence is one for which D* d -»• as n — > oo . If 
Xj are uniformly distributed then 6 n (f) — > 6(f) provided that / is Riemann 
integrable. 

Under stronger conditions than Riemann integrability, we can get rates 
of convergence for QMC. The Koksma-Hlawka inequality is 

(3) \0n(f)-m\<D* n d V HK (f), 

where Vhk is the total variation of / in the sense of Hardy and Krause. For 
properties of Vhk and other multidimensional variation measures, see [36]. 

Equation (3) gives a deterministic upper bound on the integration er- 
ror, and it factors into a measure of the points' quality and a measure 
of the integrand's roughness. There exist constructions xi,...,x n where 
D^f = 0(n~ l+e ) holds for any e > 0. Therefore, functions of finite varia- 
tion can be integrated at a much better rate by QMC than by MC. Rates of 
convergence of 0(n~ a (logn) da ), where a > 1 denotes the smoothness of the 
integrand which can therefore be arbitrarily large, can also be achieved [12]. 

Equation (3) is not usable for error estimation. Computing the star dis- 
crepancy is very difficult [19], and computing Vhk(/) is harder than integrat- 
ing /. Practical error estimates for QMC may be obtained using randomized 
quasi-Monte Carlo (RQMC). In RQMC each x; ~ U[0, l] d individually while 
the ensemble xi , . . . , x n has Pr(D* d (xi, . . . ,x n ) < C(\ogn) d /n) = 1 for some 
C < oo. For an example, see [35]. A small number of independent replicates 
of the RQMC estimate can be used to get an error estimate. RQMC has the 
further benefit of making QMC unbiased. For a survey of RQMC, see [25]. 

A key distinction between QMC and MC is that the former is effec- 
tive for Riemann integrable functions, while the latter, in principle, works 
for Lebesgue integrable functions. In practice, MC is usually implemented 
with deterministic pseudo-random numbers. The best generators are proved 
to simulate independent f7[0, 1] random variables based on either discrep- 
ancy measures over rectangles or on spectral measures. Those conditions are 
enough to prove convergence for averages of Riemann integrable functions, 
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but not for Lebesgue integrable functions. As a result, ordinary Monte Carlo 
with pseudo-random numbers is also problematic for Lebesgue integrable 
functions that are not Riemann integrable. 

2.3. Completely uniformly distributed. In the Markov chain context, we 
need a lesser known QMC concept as follows. A sequence u%,U2, . . . G [0, 1] 
is completely uniformly distributed (CUD) if for any d > 1 the points x$ = 

(v,i, . . . ,Ui + d-i) satisfy D* d (xj , . . . , x„ ) — > as n — > oo. This is one of the 
definitions of a random sequence from Knuth [22], and it is an important 
property for modern random number generators. 

Using a CUD sequence in an MCMC is akin to using up the entire period 
of a random number generator, as remarked by Niederreiter [33] in 1986. 
It is then necessary to use a small random number generator. The CUD 
sequences used by Tribble [43] are miniature versions of linear congruential 
generators and feedback shift register generators. As such, they are no slower 
than ordinary pseudo-random numbers. 

In the QMC context, we need to consider nonoverlapping d-tuples x^ = 
(udi-d+i, ■ ■ ■ ,Udi) for i > 1. It is known [7] that 

(4) 

L»; d (xS d) ,...,x^)^0 Vd>l. 

2.4. MCMC iterations. In the QMC context, the function / subsumes 
all the necessary transformations to turn a finite list of i.i.d. U[0, 1] random 
variables into the desired nonuniformly distributed quantities, as well as the 
function of those quantities whose expectation we seek. In some problems, we 
are unable to find such transformations, and so we turn to MCMC methods. 

Suppose that we want to sample x ~ tt for a density function tt defined 
with respect to Lebesgue measure on 1]C1 S . For definiteness, we will seek 
to approximate 6(f) = /(x)-7r(x) dx. In this section, we briefly present 
MCMC. For a full description of MCMC, see the monographs by Liu [28] or 
Robert and Casella [39]. 

In an MCMC simulation, we choose an arbitrary xo G fi with 7r(xo) > 
and then for i > 1 update via 

(5) x i = (^(x i _i,u i ), 

where Uj E [0, l] d and <\> is an update function described below. The distri- 
bution of Xj depends on Xo,...,Xj_i only through Xj_i and so these ran- 
dom variables have the Markov property. The function <f> is chosen so that 
the stationary distribution of Xj is tt. Then we estimate 6(f) by 6 n (f) = 
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- X^r=i /( x «) as before. If a burn- in period was used, we assume that xo is 
the last point of it. 

First, we describe the Metropolis-Hastings algorithm for computing </>(x, u) 
from the current point x G Q and u G [0, l] d . It begins with a proposal y 
taken from a transition kernel P(x,dy). With genuinely random proposals, 
the transition kernel gives a complete description. But for either quasi- Monte 
Carlo or pseudo-random sampling, it matters how we actually generate the 
proposal. We will assume that d—1 U[0,1] random variables are used to 
generate y via y = ipx_( u i: (d-i))- Then the proposal y is either accepted or 
rejected with probability A(x,y). The decision is typically based on whether 
the dth. random variable is below A. 



Definition 1 (Generator). The function ip '■ [0, l] d — » M s is a generator 
for the distribution F on R s if ^(u) ~ F when u ~ U[0, l] d . 

Definition 2 (Metropolis-Hastings update). For xefi, let ^ x : [0, 1] 11 ' 1 - 
Q be a generator for the transition kernel -P(x, dy) with conditional density 
p(- | x). The Metropolis-Hastings sampler has 

0(x,u) = ( y(x ' u) ' u d< A (^,u) 
V 7 \x, u d >A(x,u), 

where y(x,u) = ^ x (ui : ( rf _i)) and 

a / \ ■ f, 7r(y(x,u))p(x | y(x,u)) 
A(x,u) = mm 1, -— - — 

V w(y(x,u)|x) 

Example 1 [Metropolized independence sampler (MIS)]. The MIS up- 
date is a special case of the Metropolis-Hastings update in which y(x, u) = 
•0(u 1; (<2-i)) does not depend on x. 

Example 2 [Random walk Metropolis (RWM)]. The RWM update is 
a special case of the Metropolis-Hastings update in which y(x, u) = x + 
^( u l: (d-i)) f° r some generator tp not depending on x. 

Definition 3 (Systematic scan Gibbs sampler). Let x = (x\, . . . , x s ) G 
]R rf with Xj G and d = Ylj=i kj ■ To construct the systematic scan Gibbs 
sampler, let ipj,x.~ ( u j) be a kj -dimensional generator of the full conditional 
distribution of Xj given X£ for all i^j. This Gibbs sampler generates the 
new point using u G [0, l] d . Write u = (ui, . . . , u s ) with Uj G [0, l] k] . The 
systematic scan Gibbs sampler has 



<K X , U ) = (<Mx,u),(Mx,u),...,<Mx,u)), 
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where, for 1 < j < s, 

M x > u ) = ^>[i]( U j) 
and Ky] = (<j)i (x, u) , . . . , (x, u), x j+1 , . . . , x d ) ■ 

Example 3 (Inversive slice sampler). Let ir be a probability density 
function on U C W. Let n' = {(y,x) | x G 0,0 < y < vr(x)} C and let 

7r' be the uniform distribution on Q' . The inversive slice sampler is the 
systematic scan Gibbs sampler for ir 1 with each kj = 1 using inversion for 
every X[j] . 

There are many other slice samplers. See [32]. It is elementary that (y,x) ~ 
7r' implies x ~ ir. It is more usual to use (x, y), but our setting simplifies when 
we assume y is updated first. 

2.5. Some specific generators. We generate our random variables as func- 
tions of independent uniform random variables. The generators we consider 
require a finite number of inputs, so acceptance-rejection is not directly 
covered, but see the note in Section 8. 

For an encyclopedic presentation of methods to generate nonuniform ran- 
dom vectors, see Devroye [9]. Here, we limit ourselves to inversion and some 
generalizations culminating in the Rosenblatt-Chentsov transformation in- 
troduced below. We will not need to assume that ir can be sampled by 
inversion. We only need inversion for an oracle used later in a coupling ar- 
gument. 

Let F be the CDF of x G R, and for < u < 1 define 

F _1 (u) = inf{x | F(x) > u}. 

Take F~ 1 (0) = lim u ^ + F~ l (u) and = lim^i- F~ l (u), using ex- 

tended reals if necessary. Then x = has distribution F on R when 

u~U[0,l\. 

Multidimensional inversion is based on inverting the Rosenblatt transfor- 
mation [41]. Let F be the joint distribution of x£R s . Let -Fi be the marginal 
CDF of x\ and for j = 2, . . . , s, let Fj(-; x x . (j-i)) be the conditional CDF of 
Xj given x±, . . . , Xj—\. The inverse Rosenblatt transformation i/jr of u G [0, l] s 
is i/jr(u) = x G R s where 

x 1 = F{\u 1 ) 

and 

Xj = F ? " 1 (u j ;x 1:(j _ 1) ), j>l. 

If u~C/[0,l] s , then ^ R (u)~F. 

We will use the inverse Rosenblatt transformation as a first step in a cou- 
pling argument which extends the one in Chentsov [7]. 
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Definition 4 (Rosenblatt-Chentsov transformation). Let tpR be the 
inverse Rosenblatt transformation for the stationary distribution tt and let 
4> be the update function for MCMC. The Rosenblatt-Chentsov transfor- 
mation of the finite sequence Uo,ui,...,u m G [0, 1]^ is the finite sequence 
xo, . . . , x m G Q, C M s , with s < d, where xo = ^i?(uo,i : s ) and x, = 4>(xq, Uj) 
for i = 1, . . . ,m. 

The Rosenblatt-Chentsov transformation starts off using uo and inversion 
to generate xo and then it applies whatever generators are embedded in <j> 
with the innovations Uj, to sample the transition kernel. The transition 
function cj) need not be based on inversion. 

3. Consistency for MCQMC sampling. In this section, we prove suffi- 
cient conditions for some deterministic MCQMC samplers to sample con- 
sistently. The same proof applies to deterministic pseudo-random sampling. 
First, we define consistency, then some regularity conditions, and then we 
give the main results. 

3.1. Definition of consistency. Our definition of consistency is that the 
empirical distribution of the MCMC samples converges weakly to tt. 

Definition 5. The triangular array x ni i, . . . , x njn G M. s for n in an infi- 
nite set N* C N consistently samples the probability density function tt if 

1 n r 

(6) lim - V /(x n>l ) = / /(x)vr(x) dx 

n gfs}* t=l 

holds for all bounded continuous functions / : — > R. The infinite sequence 
xi,X2, . . . 6l s consistently samples tt if the triangular array of initial sub- 
sequences with x nj j = Xj for i = 1, . . . , n does. 

In practice, we use a finite list of vectors and so the triangular array 
formulation is a closer description of what we do. However, to simplify the 
presentation and avoid giving two versions of everything, we will work only 
with the infinite sequence version of consistency. Triangular array versions 
of CUD sampling for discrete state spaces are given in [44]. 

It suffices to use functions / in a convergence-determining class. For exam- 
ple, we may suppose that / is uniformly continuous [4], or that / = l( a ,t>] [5]- 
When 7r is a continuous distribution, we may use / = 1 r a y . 

3.2. Regularity conditions. Here, we define some assumptions that we need 
to make on the MCMC update functions. 
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Definition 6. Let C C [0, 1] have positive Jordan measure. If u EC 
implies that </>(x, u) = </>(x',u) for all x,x' G il, then C is a coupling region. 

Consider two iterations Xj = <^(xj_i,Uj) and x^ = <^(x^_ 1 ,Uj) with the 
same innovations Uj but possibly different starting points xo and Xq. If 
u.j G C, then Xj = x'- holds for all j >i. In Section 4, we give some nontrivial 
examples of MCMC updates with coupling regions. 

Definition 7 (Regular MCMC). Let x m = x m (u , . . . ,u m ) be the last 
point generated in the Rosenblatt-Chentsov transformation, viewed as a func- 
tion on [0, l] rf ( m+1 ) . The MCMC is regular (for bounded continuous func- 
tions) if the function /(x m (uo, ■ ■ ■ , u m )) is Riemann integrable on [0, l] rf ( m+1 ) 
whenever / is bounded and continuous. 

Note that if an MCMC is regular, then the definition of the Rosenblatt- 
Chentsov transformation implies that 

/ /(x m (u ,...,u m ))du ---rfu m = / /(x)7r(x)dx 

J[o,i] d ( m+1 ) JQ 

for any m > and all bounded continuous functions /. 

We can, of course, define regularity for MCMC also with respect to other 
classes of functions. Indeed, there are numerous equivalent conditions for 
regularity. For example, the Portmanteau theorem ([5], Chapter 1.2) implies 
that it is enough to assume that the functions / are bounded and uniformly 
continuous. Of interest are also indicator functions of rectangles since they 
appear in the definition of the local discrepancy at (1). The following theo- 
rem states some equivalent conditions. To simplify the statements, we write 
that MCMC is regular for indicator functions whenever l J 4(x m (uo, . . . , u m )) 
is Riemann integrable on [0, i] rf ( m + 1 ) ) where A is either A = [a, b] with a, b 
finite or A = 0. 

Theorem 1. The following statements are equivalent: 

(i) MCMC is regular for bounded continuous functions. 

(ii) MCMC is regular for bounded uniformly continuous functions. 
(hi) MCMC is regular for indicator functions lr a ,b] of rectangles [a, b]. 

Proof. This result follows by applying the Portmanteau theorem ([5], 
Chapter 1.2) and some methods from real analysis. □ 



A regular MCMC is one that satisfies any (and hence all) of the above. 
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3.3. Main results for Metropolis-Hastings. Theorem 2 below is the main 
result that we will use for Metropolis-Hastings sampling. One does not ex- 
pect CUD sampling to correct for an MCMC algorithm that would not be 
ergodic when sampled with i.i.d. inputs. Ergodicity is assured through our 
assumption that there is a coupling region. Section 4 below shows that some 
nontrivial MCMC methods have such regions. Theorem 2 does not require 
the detailed balance condition that Metropolis-Hastings satisfies, and so it 
may apply to some nonreversible chains too. 

Theorem 2. Let Q C W and letxo G Cl, and fori > 1 letxi = 0(xj_i,u$) 
where <fi is the update function of a regular MCMC with a coupling region C . 
If \ii = -,v d i) for a CUD sequence («i)i>i, then Xi, . . . ,x n con- 

sistently samples ir. 

The proof of Theorem 2 is in the Appendix. It shows that the fraction of 
points Xj in a bounded rectangle [a, b] converges to Jj a b j vr(x) dx. Almost 
the same proof technique applies to expectations of bounded continuous 
functions. 

3.4. Main results for Gibbs sampling. The Gibbs sampler can be viewed 
as a special case of Metropolis-Hastings with acceptance probability one. 
However, it is more straightforward to study it by applying results on it- 
erated function mappings to (5) using methods from Diaconis and Freed- 
man [10] and Alsmeyer and Fuh [2]. 

In this subsection, we assume that ($7, d) is a complete separable metric 
space. We assume that the update function 0(x, u) is jointly measurable 
in x and u and that it is Lipschitz continuous in x for any u. Lipschitz 
continuity is defined through the metric d(-, •) on f2. The Lipschitz constant, 
which depends on u, is 

/ v n 9 , \ d((/>(x,u),^)(x / ,u)) 

<7) <(u)= 5? — w — • 

For each u n G [0, l] d , define L n = £(u n ). 

Next, we present a theorem from Alsmeyer and Fuh [2] on iterated ran- 
dom mappings. The n step iteration, denoted 4> n , is defined by 0i(x;ui) = 
</>(x,ui) and for n > 2 : n (x; m, . . . , u n ) = 0(0 n _i(x;ui, . . . ,Un_i),u n ). 

Theorem 3. Let the update function (/>(x, u) be jointly measurable in x 
and u with Jj ^ d log(£(u)) du < and, for some p > 0, Jj ^ d £(u) p du < oo. 

Assume that there is a point x' G with Jj ^ log + (d((/>(x / , u),x')) du < oo 
and E(d((j)(~x! , u), x') p ) < oo. Then there is a 7* G (0,1) such that for all 
7 G (7*, 1) there is a a 7 G (0, 1) such that for every x,x G £1 

(8) lim a- m Pr( ( i(^ m (x;-),</' m (x;-))>7 m ) = 0. 
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Proof. This follows by specializing Corollary 2.5(a) of [2] to the present 
setting. □ 

Theorem 4. Let (Q,d) be a complete separable metric space and let 
(vi)i>i be a CUD sequence such that for every sequence (d n ) n >i of natural 
numbers with d n = O(logn), we have lim n _ ! . 00 = 0. Let xo G and 
for i > 1 let Xj = 0(xj_i,Uj) be the Gibbs sampler update for stationary 
distribution ir. Assume that 4> satisfies the conditions of Theorem 3 and that 
there is a 7 G (7*, 1) such that 

B m (x,x) = {v G [0, l] dm :^ m (x, v), <£ m (x, v)) > 7 m } 

zs Jordan measurable for all m > 1 and x,ie!l. Under these conditions, if 
the Gibbs sampler is regular, then xi, . . . ,x n consistently samples it. 

The proof of Theorem 4 is in the Appendix. Like Theorem 2, it shows 
that bounded rectangles [a, b] have asymptotically the correct proportion of 
points. Once again, similar arguments apply for bounded continuous func- 
tions of x. 

Although not explicitly stated there, the proof of [11], Theorem 1, shows 
the existence of sequences (i>j)i>i for which 



D* n a ({(v d{i _ 1)+1 ,. . . , v di ),i = l,...,n})< Cy dl ° S( ^ + 1) , 

for all n, d G N, where C > is a constant independent of n and d. Unfortu- 
nately, no explicit construction of such a sequence is given in [11]. Then for 
any sequence (d n ) n >i of natural numbers with d n = O(logn) we obtain that 

D* n dn ({(v dn{i -i )+ i, ■ ■ -,v dni ),i = 1, . . . ,n}) < C' l0g ( n + 1) ^0 as 00 



n 

In Theorem 2, we assumed that the coupling region C is Jordan mea- 
surable In Theorem 4, we do not have a coupling region, but still have an 
analogous assumption, namely that the sets B m (x,x) are Jordan measurable. 
A condition on <fi which guarantees that B m (x,x) is Jordan measurable is 
given in Section 6. 

4. Examples of coupling regions. Theorem 2 used coupling regions. These 
are somewhat special. But they do exist for some realistic MCMC algo- 
rithms. 

Lemma 1. Let (f> be the update for the Metropolized independence sam- 
pler on Q C M s obtaining the proposal y = ip(ui : (d-i)), where ip generates 
samples from the density p, which are accepted when 

vr(y)p(x) 



Ud 



< 



7r(x)i?(y) 
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Assume that the importance ratio is bounded above, that is, 

7r(x) 

K = SUp < OO. 

*en P(x) 

Suppose also that there is a rectangle [a, b] C [0, l] d— 1 of positive volume with 

„ 7r(Wu)) 

77 = inf ; >0. 
ue[a,b] p(V>( u )) 

Then C = [a, b] x [0,77/re] is a coupling region. 



Proof. The set C has positive Jordan measure. Suppose that u G C. Then 
7r(y)p(x) > 7?p(y)-7r(x) > u d p(y)vr(x), 

K 

and so 0(x, u) = y, regardless of x. □ 



Lemma 2. Let ir be a density on a bounded rectangular region = 
[a, b] C M s . Assume that < rj < vr(x) < re < 00 holds for all x G Let 
^' = {(?/> x ) I — 2/ — 7r ( x )} C [a, b] x [0, re] fre £/ie domain of the inversive 
slice sampler. Let (yj,x,j) = 0((yj_i,Xj_i),Uj) /or £ [0, l] s+1 6e i/ie up- 
date /or i/ie inversive slice sampler and put (y^x^) = 0((j^_ 1 ,Xj_ 1 ),Uj). // 
Ui G C = [0,77/re] x [0, l] s , i/ierc x» = x-. 

Proof. If u^i < 77/re, then yj = «ji7r(xj_i) and y^ = Uji7r(x^_ 1 ) are in 
the set [0,r7/re]. The distribution of x given y for any y G [0,77/re] is J7[a,b]. 
Therefore, Xj = x^ = a + u 2 . r fl +i) (b ~~ a ) (componentwise). □ 

Lemma 2 does not couple the chains because yi and y^ are different in 
general. But because Xj = x£, a coupling will happen at the next step, that 
is, (yj+i,x i+ i) = (y- +1 ,x- +1 ) when Uj G [0,77/re] x [0, l] s . One could revise 
Theorem 2 to include couplings that happen within some number t of steps 
after u G C happens. In this case, it is simpler to say that the chain whose 
update comprises two iterations of the inversive slice sampler satisfies Theo- 
rem 2. For a chain whose update is just one iteration, the averages over odd 
and even numbered iterations both converge properly and so that chain is 
also consistent. Alternatively, we could modify the space of y values so that 
all y G [0,77/K] are identified as one point. Then C is a coupling region. 

The result of Lemma 2 also applies to slice samplers that sample y | x 
and then x | y ~ U{x | vr(x) < y} using an s-dimensional generator that is 
not necessarily inversion. 
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5. Riemann integrability. Theorem 2 proves that MCMC consistently 
samples ir when implemented using CUD sequences. We required certain 
Riemann integrability conditions in defining regular Rosenblatt-Chentsov 
transformations. Here, we verify that nontrivial MCMC algorithms can have 
regular Rosenblatt-Chentsov transformations. 

It seems odd to use the Riemann integral over 100 years after Lebesgue [23] . 
But pseudo-random number generators are now typically designed to meet 
an equidistribution criterion over rectangular regions [29]. Other times they 
are designed with a spectral condition in mind. This again is closely related 
to Riemann integrability via the Weyl [45] condition where n (f) —> 6(f) 
for all trigonometric polynomials /(x) = e 2 nv-ik'x jf anc j on iy xi, . . . ,x n 
are uniformly distributed. Unless one is using physical random numbers, 
the Riemann integral, or perhaps the improper Riemann integral is almost 
implicit. 

5.1. Definitions and basic theorems. A function from A C R d to R s for 
s > 1 is Riemann integrable if all of its s components are. To study how 
Riemann integrability propagates, we will use the following two definitions. 

Definition 8. For a function f :R k ->R, the discontinuity set of / is 

D(f) = {x G l fc | / discontinuous at x}. 

If / is only defined on A C R fc , then D(f) = D(f ) where /o(x) = /(x) for 
x £ A and / (x) = for x ^ A. 

Definition 9. For a function / : R fc — > R, the graph of / is 
G(f) = {(x,y)eR k+1 \y = f(x)}. 

Lebesgue's theorem, next, provides a checkable characterization of Rie- 
mann integrability. 

Theorem 5 (Lebesgue's theorem). Let A C R d be bounded and let f:A^ 
R be a bounded function. Then f is Riemann integrable iff \d(D(f)) = 0. 

Proof. See Marsden and Hoffman [30], page 455. □ 

5.2. Need for Riemann integrable proposals. Here, we show that Rie- 
mann integrability adds a special requirement to the way an algorithm is 
implemented. Then we give an example to show that propagation rules for 
Riemann integrability are more complicated than are those for continuity 
and differentiability. 
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Suppose that F is the A/"((q), ( ^)) distribution for some p G (—1, 1). If 
we take 

xi (u) = $- 1 (m) 

and 



X 2 (u)= P X 1 (u) + y / l-p 2 ^ 1 (u 2 ), 

then we find that /(u) = lai<a;i(u)<&i x 1 a 2 <x 2 (u)<fe 2 is discontinuous only 
on a set of measure zero. It is trivially bounded, and these two facts imply 
it is Riemann integrable on [0, l] 2 . 

Another transformation for the same distribution F is 

and 



X2 



-pxi(u) - v 7 ! - p 2 ^ 1 ^), uiG 



Changing the conditional distribution of X2 given xi on a set of measure 
leaves the distribution F of x unchanged. But for this version, we find / 
can be discontinuous on more than a set of measure and so this inverse 
Rosenblatt transformation of F is not regular. 

In practice, of course, one would use the regular version of the transfor- 
mation. But propagating Riemann integrability to a function built up from 
several other functions is not always straightforward. The core of the prob- 
lem is that the composition of two Riemann integrable functions need not 
be Riemann integrable. 

As an example [18], consider Thomae's function on (0, 1), 



(l/q, x =p/qe 
\ 0, else, 



where it is assumed that p and q in the representation p/q have no com- 
mon factors. This / is continuous except on Q n (0, 1) and so it is Rie- 
mann integrable. The function g(x) = lo<x<i is also Riemann integrable. 
But g(f(x)) = 1 X £Q for x G (0, 1), which is famously not Riemann integrable. 
The class of Riemann integrable functions, while more restrictive than we 
might like for conclusions, is also too broad to use in propagation rules. 



5.3. Specializing to MCMC. First, we show that the acceptance-rejection 
step in Metropolis-Hastings does not cause problems with Riemann integra- 
bility. 
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Lemma 3. Let k £ N and suppose that g, h and A are real-valued Rie- 
mann integrable functions on [0, l] k . For u G [0, define 

g(u 1:k ), u k+ i< A(\i 1:k ), 
h(u 1:k ), else. 



/(u) = 

|^ : kit 

Then f is Riemann integrable on [0, l] k+l 



Proof. First, D(f) C ((D(g)UD(h)) x [0,1])UG(4). Riemann integra- 
bility of g gives X k (D(g)) = 0. Similarly, X k (D{h)) = 0. Therefore, A fc+ i(L>0)U 
£>(ZO)x[0,l])=0. 

Turning to we split the domain [0, l] k of A into n fc congruent subcu- 

k 

bes C n< i, . . . ,C n n k (whose boundaries overlap). Then G(A) C IJILi ^ri,t x 
[m i>n , M i)n ], where mj, n = inf Ul: fc6C - n s A(u i: fe ) and M i)n = sup Ul;fc eCn . A(ui :fe ). 

As a result X k+1 (G(h)) < n~ k J2i( M i: 

n 77ij !n ). Riemann integrability of A 
implies this upper bound vanishes as n — > oo. Therefore, \ k +i(G(A)) = 
and so X k+ i(D(f)) = and the result follows by Lebesgue's theorem. □ 

In the MCMC context, g and h are the jth component of the proposal and 
the previous state, respectively, A is the acceptance probability, and u is the 
ensemble of uniform random variables used in m stage Rosenblatt-Chentsov 
coupling and k = (m + l)d — 1. 

For consistency results, we study the proportion of times /(u) € [a,b]. It 
is enough to consider the components one at a time and in turn to show 
i/j(u)<bj and 1/ J ( u ) <aj . are Riemann integrable. However, as the example 
with Thomae's function shows, even the indicator function of an interval 
applied to a Riemann integrable function can give a non- Riemann integrable 
composite function. 

We may avoid truncation by employing bounded continuous test func- 
tions. We will use the following simple corollary of Lebesgue's theorem. 

Lemma 4. For k>l and r > 1, let gi,...,g r be Riemann integrable 
functions from [0, l] k to a bounded interval [a,b] C M. Let h be a continuous 
function from [a, b] k to R. Then 

f{u) = h(g 1 {u),...,g k {u)) 

is Riemann integrable on [0,l] fc . 

Proof. Because h is continuous, D(f) C [fj = iD(g k ). But X k {D(g k )) = 0. 
Therefore, X k (D(f)) = and so / is Riemann integrable by Lebesgue's 
theorem. □ 



We can also propagate Riemann integrability through monotonicity. If g 
is a monotone function from K to R and / is the indicator of an interval, then 
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/ o g is the indicator of an interval too, and hence is Riemann integrable, 
when that interval is of finite length. 

Lemma 5. Let Fi(xi) be the CDF of x\ and for j = 2, . . . , s, let Fj(xj | 
x l:(j— l)) be the conditional CDF of xj given ~x-\-u—\\- Suppose that the 
CDFs Fj(xj | x 1: (j-i)) are continuous functions o/xi : j and that the quantile 
functions F^ l {uj | x 1: ^_i)) are continuous in («j,x 1: € [0,1] X K J_1 , 

for j = 2, . . . ,s. Define functions zi(u) = FF {u\) and Zj(u) = F^ l {uj \ 
z 1: (j_l)(u)) forj = 2,...,s, where *u{j-\) = ( z i,-- -,Zj-i). Then for be R s , 
the set 

5(b) = {u | Zj (u) < bj, 1 < j < s} 

is Jordan measurable. 

Proof. By hypothesis, is a continuous function of u € [0, l] s , for k = 
1, . . . , s, and so is F)~(bk \ z 1: ( fc _ 1 )(u)). This latter only depends on u 1: 
for k = 2, . . . , s, and so we write it as <?fc(u 1: (&_!)). 

For fc = l,...,s, let S k = {u i:fc | Uj < gj(u 1: for j = 1, . . . , A;}. The 

set 5i is the interval [0, F 1 ~ 1 (6i)], and hence is Jordan measurable. Suppose 
Sk is Jordan measurable for k < s. Then 

S k+1 = (S k x [0, 1]) n G k+1 where G k+1 = {u 1: (fc+1 ) | u k+ i < g k +i(u-i ■. k)}- 

The set S k x [0, 1] is Jordan measurable because S k is. The boundary of G k+ \ 
is contained within the intersection of the graph of g k +± and the boundary of 
[0, l] fc+1 and so G k +i is Jordan measurable. The result follows by induction 
because S(h) = S s . □ 

5.4. Regularity of Rosenblatt-Chentsov. Here, we give sufficient condi- 
tions for the Rosenblatt-Chentsov transformation to be regular. 

Theorem 6. For integer m > 0, letx m be the endpoint of the Rosenblatt- 
Chentsov transformation of [0,l]^ +1 ^ m , started with a Riemann integrable 
function ipR and continued via the Metropolis-Hastings update (j). Let <j) be 
defined in terms of the proposal function y : M s x [0, l] d_1 — > R s with proposal 
density p(-,-) :M S xI s -> [0,oo) and target density tt :M S — > [0,oo). Let f be 
a bounded continuous function on R s . 

Jf ip is bounded and y, P and 7r are bounded continuous functions, then 
/(x m (uo, . . . , u m )) is a Riemann integrable function of the variables [0, l]( d+1 )" 
used in the Rosenblatt-Chentsov transformation. 

Proof. We only need to show that Riemann integrable function 

of (uq, . . . , u m ) G [0, l] d ( m + 1 ) and then the result follows by Lemma 4. 
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We proceed by induction. For m = 0, xo = ?/>(uo) is bounded and contin- 
uous on [0, l] d , hence it is Riemann integrable. 

Now suppose that x m _i is a Riemann integrable function on [0, l] dm . 
Let h(uo, . . . , u m _i, u m i. be the value x m _i, written as a Riemann 

integrable function on [0, l] dm+rf_1 , so it ignores its last d — 1 arguments. 
Let 0(u o ,...,u m _i,u ml: be the proposal y m = y(x m _i, u. ml . = 

y(g(-), x m _i , u ml .( d _;n). This is a continuous function y(-,-) of two Rie- 
mann integrable functions on [0, i] rf ( m + 1 )~ 1 an d so it is Riemann integrable. 
Next, A(-, •) is a continuous function of both x m _i and y m which are in turn 
Riemann integrable functions on [0, i]» m + d_1 ) an d so A(-, •) is Riemann inte- 
grable. Then x m is a Riemann integrable function on [0, l] rfm + rf 5 by Lemma 3, 
completing the induction. □ 

6. Conditions for the Gibbs sampler. In studying the Gibbs sampler, we 
made several assumptions. First, we required Jordan measur ability for the 
sets £> m (x,x). Second, we required a contraction property. In this section, 
we show that those assumptions are reasonable. 

6.1. Jordan measurability of 23 m (x,x). We give an example where the 
conditions of Theorem 4 are satisfied, that is, the sets B m (x, x) are Jordan 
measurable for all m > 1 and x,x G O (for some suitable domain !] C K s ). 
Assume (additionally to the assumptions made in Theorem 4) that </>(x, u) 
is totally differentiable with continuous derivative with respect to u for each 
x£!J and that d is based on the L p norm for some 1 < p < oo. Further, 
assume that the gradient of d(0(x, u), 0(x, u)) with respect to u vanishes 
only on a null set for all x, x G $7, x/x, that is, 

A({u G [0, l] d : V u d((f>(x, u), cj){% u)) = 0}) = 0, 

for all x,x£S], x/x, where A denotes the Lebesgue measure and where 
V u d(cf)(x, u), 0(x, u)) = (g|- d(0(x, u), 0(x, u))) J=lj ... >rf denotes the gradient. 
Then, for all m > 1, we also have 

A({u G [0, l] dm : V u d(0 m (x, u), m (x, u)) = 0}) = 

for all x,x G fi, x/x, Let x,x G with x/x be fixed. Then for almost 
all u* G [0,l] dm we have V u d(<£ m (x, u*), m (x, u*)) / 0. Therefore, there 
is a 5 > such that V u cZ(<^ m (x, u), m (x,u)) / for all u G A^(u*), where 
AT 5 ( U *) = {v G [0, l] dm : ||u* — v||l 2 < 5} is a neighborhood of u*. Therefore, 
the directional derivative at a point u G Ng(u*) is different from 0, except on 
a hyperplane, that is, almost everywhere. Hence, by the mean value theorem, 
the function d((j) m (x, u),(p m (Sc, u)) for u G iV^u*) can at most be constant on 
a hyperplane, which has Lebesgue measure 0. Note that N s (u*) n Q dm / 0, 
therefore there is a countable number of elements u*,u|,... and numbers 
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61,62, - ■ ■ with the properties of u* and 6 described above and for which we 
have |Jn=i N s„«) = [0, l] dm - Therefore, we have 

A({u G [0, l] dm : d(^ m (x, u),^ m (x, u)) = c}) = 0, 

for any c > 0. 

The set of points where lB m ( x ,x) is discontinuous is given by 

D = {u G [0, l] dm : > 3v, v' G N s (u) such that 

(x,v)) > 7 m and d(0 m (x, v'), v')) < 7™}- 

As £> m (x,x) and {u G [0, l] dm :d(0 m (x, u),0 m (x,u)) < 7 m } are open, it fol- 
lows that 

D C {u G [0, : d(0 m (x, u), m (x, u)) = 7 m }. 

Therefore, Xd m (D) = and Lebesgue's theorem (see Theorem 5) implies that 
£> m (x, x) is Jordan measurable. 

6.2. Contraction. Here, we illustrate how the Gibbs sampler yields a con- 
traction for the probit model. In this model, 

Zi = xJ/3 + ej 

and 

Yi = lZi>o, 

for i = l,...,n for independent ej ~ jV(0, 1). The coefficient /3 G R p has 
a noninformative prior distribution. The predictors are x$ G W. We define 
the matrix X with element Xij. We assume that X has rank p. 

The state of the Markov chain is (/3, Z) G C M p+n , where Z = (Z 1 , . . . , 
Z n ) T . Given the observed data (y±, . . . ,y n ,xi, . . . ,x n ), we can use the Gibbs 
sampler to simulate the posterior distribution of /3 and Z = {Z\, . . . , Z n ) T . 
A single step of the Gibbs sampler makes the transition 

y Z (k-i) J — > ^ Z (fc) J — > J 

for A; > 1 using generators given explicitly below. The values u±, . . . , u n+p are 
the components of G (0, l) n+p . We also write the transitions as 

(0, Z) -> 0((/3, Z), u) = (</>«((/?, Z), u), 0( 2 ) ((/?, Z), u)), 

where <^> and its components <j)W and 0( 2 ) (for /3 and Z, resp.) are given 
explicitly below. 

Given /3, the components of Z are independent, with 

fAA(x7/3,l)|Z 4 >0, ify i = i, 

*~ \aa(x7/3,i)|z 4 <o, ify- = o. 



(9) Zi 
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We may generate them from u\, . . . , u n £ (0, 1) by 

xjf3 + $-i($(-xTg) + uM-kJP)), if Y % = 1, 
x7/3 + ^ 1 ( Ul $(-x7/3)), ify, = 0. 

Given Z, the distribution of is /3 ~ M((X T X)~ 1 X T Z, (X T Xy 1 ). We 
may generate it using u n+ \, . . . , u n+p £ (0, 1) via 

(10) /3 = (X T X)" 1 X T Z+(X T X)- 1 / 2 : 

Thus equation (10) defines i^ 1 ) while (9) defines (j)( 2 \ 

The framework in [2] allows one to pick a metric that conforms to the prob- 
lem. We use the metric d((/3, Z), (/?', Z')) = max(di(/3, /3'), d 2 (Z, Z')), where 

(11) diOS, /?') = diG9 - /?') = sJ{P - P>) T (X T X)(I3 ~ £') 
and 



(12) d 2 (Z, Z') = d 2 (Z - Z') = ^(Z - Z') T (Z - Z'). 
We show below that 

(13) d((/3^, ZW), (/3'W, Z'W)) < d((/? (fc - 1} , Z^" 1 )), (^'f*- 1 ),^*- 1 ))) 

for pairs (^(M^l*- 1 )), (^'(fc-i^ Z '(fc-i)) of distinct points in O. Both met- 
rics di and d 2 are also norms, which simplifies our task. 

Suppose first that ^ k ~^ = 0'C*- 1 ). Then it follows easily that Z( fc ) = Z'W 
and f3^ = fi'( k \ so then the left-hand side of (13) is 0. As a result, we may 
assume without loss of generality that di(/3 (fc_1) - fi'^V) > 0. With this 
assumption, we will use the bound 

d((/3( fc ),Z( fc )),(/3 / ( fc ),Z / ( fc ))) 
d((/3( fc " 1 ),Z( fc - 1 )),(/3 / ( fc - 1 ),Z / ( fc - 1 ))) 

(14) 

/ d^/gW-^W) d 2 (zw - z'W) 

We begin by studying the update to Z. Subtracting xJ/3 from both sides 
of (9), applying $(•), differentiating with respect to (3 and gathering up 
terms, we find that -§gZi = AjXj where 



(15) 



W-xf^ 
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and (p is the A/"(0, 1) probability density function. 

It is clear that Aj < 1. Next, we show that Aj > 0. We begin by inverting (9) 
to get 



f *(Zi-yZP)-*(-xlP) 



(16) 



Ui 



Substituting (16) into (15) and simplifying yields 



if Yi = l, 



if Yi = 0. 



(17) 



1-Ai 



y(-x7/3)d>(^-x7/3) 
t$(- x T/3)^-xT/3)' 



if Yi = 1, 



if K = 0. 



Now consider the function t(x) = (p(x)/&(x). This function is nonnegative 
and decreasing, using a Mill's ratio bound from [20]. When Yi = l, then 1 — 
Aj = t(xlJ P)/t(xJ P — Zj) < 1 because then Zj > 0. We also used symmetry 
of </?(•)• If instead Yi = 0, then 1 — Aj = r(— xJ/3)/t(— ~xj (3 + Zj) < 1 because 
then Zj < 0. Either way, 1 — Aj < 1 and therefore Aj G [0, 1) for all i. 
Writing the previous results in a compact matrix form, we have 



8Z 

dp 



( dzi 



AX, 



where A = A(/3, Z) = diag(Ai, . 



, A n ). Similarly, equation (10) yields 



Thus, for the Z update with any G (0, 1) 



ra+p 



(18) 



rf 2 (Z( fc ) - Z'W) / <9Z( fc ) 

d l0 9(fc-i)-^(fc-i)) " d2 V^I) 



< sup ||A(/3,Z)X£|| <1. 

/3,Z 

(^o T xe=i 



For the /3 update, applying the chain rule gives 

dpw opw az^- 1 ) 



dp( k ~ l ) 5Z( fc -!) dp^-V 



(X T X)- 1 X T AX 
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and then 

d 1 (/3(^)-^))-^/ 1 U^)V 

= sup di((X T X)~ 1 X T AX£) 

/3,Z 
(X£) T X£=1 

= sup di((X T X)" 1 X T Ar ? ) 

AZ,||rj||=l 

(19) 

= sup \\X(X T X)- 1 X T Arj\\ 

/3,Z,||r?||=l 

< max Aj 

l<i<n 
<1, 

using the nonexpansive property of the projection matrix X(X T X)~ 1 X T . 
By combining (18) with (19), we establish the contraction (13). 

7. Open versus closed intervals. In the Lebesgue formulation, £7(0, l) d 
and U[0, l] d are the same distribution, in that they cannot be distinguished 
with positive probability from any countable sample of independent values. 
Riemann integrals are usually defined for [0,1] d and discrepancy measures 
are usually defined for either [0, l] d or [0, l) d . These latter theories are de- 
signed for bounded functions. 

In Monte Carlo simulations, sometimes values £ {0, 1} are produced. 
These end points can be problematic with inversion, where they may yield 
extended real values, and hence good practice is to select random number 
generators supported in the open interval (0, 1). 

For our Gibbs sampler example with the probit model, we required u^. £ 
(0, l) n+p . This was necessary because otherwise the values 0(x, u) might fail 
to belong to 0. 

Our slice sampler example had equal to the bounded rectangle [a, b]. 
Then values Uij £ {0, 1} do not generate sample points outside £1. 

Our Metropolized independence sampler did not require bounded support. 
It could produce extended real values. Those however are not problematic for 
weak convergence, which is based on averages of l[ a ,b]( x i) or other bounded 
test functions. Also, the chain will not get stuck at an unbounded point. 

8. Discussion. We have demonstrated that MCQMC algorithms formed 
by Metropolis-Hastings updates driven by completely uniformly distributed 
points can consistently sample a continuous stationary distribution. Some 
regularity conditions are required, but we have also shown that those con- 
ditions hold for many, though by no means all, MCMC updates. The result 
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is a kind of ergodic theorem for QMC like the ones in [7] and [37] for finite 
state spaces. 

When RQMC is used in place of QMC to drive an MCMC simulation, 
then instead of CUD points, we need to use weakly CUD points. These 
satisfy Pr(L>; d > e) -> for all e > and all d G N. 

Our version of MCMC above leaves out some methods in which one or 
more components of are generated by acceptance-rejection sampling be- 
cause then we cannot assume d < oo. A modification based on splicing i.i.d. 
U[0, 1] random variables into a CUD sequence was proposed by Liao [27] 
and then shown to result in a weakly CUD sequence in [44]. 

We do not expect that a global substitution of QMC points will always 
bring a large improvement to MCMC algorithms. What we do expect is that 
means of smooth functions of the state vector in Gibbs samplers will often 
benefit greatly from more even sampling. 

It is also a fair question to ask when one needs an MCMC result com- 
puted to the high precision that QMC sometimes makes possible. Gelman 
and Shirley [17] address this issue, distinguishing Task 1 (inference about 
a parameter 6) from Task 2 [precise determination of E(0) or more generally 
E(/(0)) conditional on the data, or a posterior quantile of 0\. The accuracy 
of Task 1 problems may be limited more by sample size than by Monte 
Carlo effort. Task 2 problems include computation of normalizing constants 
and problems where one wants to report numerically stable, and hence more 
reproducible, simulation output. 

APPENDIX: PROOFS 
This Appendix contains the lengthier proofs. 

We need one technical lemma about CUD points. Consider overlapping 
blocks of dfc-tuples from itj, with starting indices d units apart. If it, are CUD 
then these overlapping blocks are uniformly distributed. The proof works by 
embedding the ci/c-tuples into nonoverlapping rdfc-tuples. For large r, the 
boundary effect between adjacent blocks becomes negligible. This result is 
also needed for the argument in [37]. 

Lemma 6. For j > 1, let Uj G [0,1]. For integers d,i,k > 1, let Xj = 
(ud(i-l)+li ■ ■ ■ ) u d(i-l)+dk) ■ If u j are completely uniformly distributed, then 
Xj G [0, l] dk are uniformly distributed. 

Proof. Choose any c G [0, l] dk . Let v = YYj=i c j De the volume of [0, c). 
For integers r > 1, define f r on [0, l] rdk by / r (u) = Y^=o 1 [o,c)(^i<i+i ) ■ • • , 
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Ujd+dk)- Each f r has Riemann integral ((r — \)k + l)t;. We use / r on nonover- 
lapping blocks of length rdk from uy. 

x n \n/{rk)\ 

— l[0,c)( x i) > - /r(«(t-lWfc+li • • -, u irdk) 

i=l i=l 

fr -l)fc + l r-1 
-f > u, 



1 v^n 



after using (4). Taking r as large as we like, we get lim inf n _>oo ^ SILi l[o,c)( x ?) — 
v. It follows that lim inf n ^oo \ J21=i l[a,b)( x t) — Vol[a, b) for any rectangular 
subset [a, b) C [0,1]*. Therefore, lim sup n _ 5 . 00 ^ Ya=i l[o,c)( x i) — v too, for 
otherwise some rectangle [a, b) would get too few points. □ 

Now, we prove the main theorems from Section 3. 

Proof of Theorem 2. Pick e > 0. Now let m £ N and for i = 1, . . . ,n 
define the sequence m , . . . , x.' { m rn G ft as the Rosenblatt-Chentsov trans- 
formation of Uj , . . . , Uj +m . 

Suppose that <j) is regular and for a bounded rectangle [a, b] C M s , let 
/( x ) = ![a,b]( x )- Then 

r i n 

(20) J /(x)vr(x) dx - - ]T 7( x i) = Si + S 2 + S 3 , 



where 



s i= / /(x)7r(x)dx- -^/(x- jm>m ), 

4=1 

1 " 

S 2 = - V /(x- m ) - /( x i+m) 



n . 



and 



1 n 

^3 = - yZfi^i+m) - /( x i)- 
n ^ — * 



n 
i=l 



For Si , notice that x^ mm G [a, b] if and only if , . . . , u d ( i+m ) ) lies 

in a d(m+ 1) -dimensional region B\. The region B\ has volume Jj a b , 7r(x) dx 
because Pr(x^ mm € [a,b]) is / [ab] vr(x)dx when (^(i_i)+i, . . . ,v d ( i+m) ) ~ 

J7[0, i] d ( m + 1 ). it has a Riemann integrable indicator function by hypothesis. 
Then because (fj)j>i are CUD, and using Lemma 6 with k = rn + 1, we get 



/" 1 - 

/ /(x)7T(x)dx- -X,/( x/ i,m,m) 
J i=l 



as n — > oo. 
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Now, consider £2- The only nonzero terms arise when Xj +rn ^x' imm . This 
in turn requires that the coupling region C is avoided m consecutive times, 
by u i+ i,...,u i+m . Then (v di+ i, . . . ,u<f( i+m )) belongs to a region of volume 
at most (1 — Vol(C)) m . Choose m large enough that (1 — Vol(C)) m < e. Then 



lim sup 



■i,m,m-/ 



/(Xi+m) 



t=l 



< £. 



For the third term, | S3 1 is at most m/n, which goes to as n — > 00. Thus, 
we have 



1 n f 
lim -V]l Xj6 ra,b] - / 7r(x)cfx 

n ^°° ™ ^ ./[a,b] 



< e. 



As e > was chosen arbitrarily, the result follows for this case. 

The result holds trivially for the function 1q, hence we are done. □ 



Proof of Theorem 3. We use the notation from the proof of Theo- 
rem 2. As in the proof of Theorem 2, we write f /(x)-7r(x) (fx — — YH=i /( x «) 
as the sum of three terms. The first and third terms vanish by the same 
arguments we used in Theorem 2. 

For the second term, we have 

1 n 

|S 2 (n)| < -J2\f(^'i,m,m) -/(Xi+m)|- 
i=l 

Let e > be arbitrary. We show that limsup,^^ |E2(n)| < e. As e > is 
arbitrary, this then implies that limsup n _ 5>00 |E2(n)| = 0. 

Assume that the Gibbs sampler is regular for rectangles and for a bounded 
positive volume rectangle [a, b] C W let /(x) = lr a b i(x). For < 5 < 
mini<j< d (6j - aj), let 6 = (6,...,6) € W and put f s (x) = l[ a s,b+S] and 

fsi*-) = l[ a+( 5,b-(5]- 

Because fs(x) > /(x) > /_s(x), the triple (/_<y( 

X i m m J > J l X i m m) 5 

/ tf (xJ >ro> J) must be in the set 5 = {(0, 0, 0), (0, 0, 1), (0, 1, 1), (1, 1, 1)}. Like- 
wise /(xj +m ) £ {0,1}. By inspecting all 8 cases in S x {0,1}, we find that 
|S 2 | < 0"l + 02 + CJ3, for 

1 - 

^1 ~~ ^ ] /<5( x i,m,m) — /— <5( x i,m,m)' 
i=l 

1 - 

°" 2 = r £(/-<5( X U,m) - / ( x i+m)) + 



QMC FOR MCMC 27 

and 

1 n 



n . 

1=1 



where z+ = max(z, 0). 
Choose 5 > such that 



/ 



7T (x) dx < 



nn([a-<5,b+<5]\[a+<5,b-<5]) 3 

As the Gibbs sampler is regular for rectangles, (t>j)j>i is a CUD sequence, 
and x^ m m is constructed using the Rosenblatt-Chentsov transformation we 
have 

A({u G [0, : < m , m G [a - 5, b + 5} \ [a, b]}) 

= / 7r(x)dx < |, 

JQn([a-<S,b+<5]\[a+<S,b+5]) J 

and so limsup^^oo |o"i(n)| < e/3. 

The points x' imm and Xj +m have different starting points x' im0 and Xj, 
but are updated m times using the same Uf+i, . . . , Uj +m , that is, x^ m m = 
^m( x i,m,o> u «+i' • • • > u m) and x i+m = m (x i; u i+ i, . . . , u» +m ). Therefore, The- 
orem 3 implies that there is a constant C > such that for all sufficiently 
large m > m* the region 

B m ,i = {(vi, . . . , v m ) G [0, l] dm : (i((/. m (x^ m>0 , (vi, . . . , v m )), 

0m (Xj , ( Vj , . . . , V m 

)))>7 m }, 

has volume at most Ca™. Let £? m = (JiLi ^m,t- Let /3 = oo if [a, b] n ft = or 
O \ [a - 5, b + 6] = and p = inf {d(y , y') : y G [a, b] n O, y' G O \ [a - 5, b + 6] } 
otherwise. 

Let mi = mi(n) be such that Cna™ 1 < e/3 and 7 mi < /?. Now take 
mo > maxjmi, m*, . . . , m* }. For large enough n, we can take mo = mo(n) = 

r lQgr iS/a 7 C/£) l + L Then S ™o has volume at most e/3. 

Thus, /-«(xJ >Tn0jtno ) > /(x i+mo ) implies that d(x^ mo mo ,x i+mo ) > f3, which 
in turn implies that (u»+i, . . . , u i+mo ) G B m0ti , and so (uj +1 , . . . , u i+mo ) G 
B mo . Therefore, we have 



1 n 

limsup|(T2(n)| <limsup-^l (ui+li ... )U )eB = limsup \{B mo ) < 

n— >oo n— >oo ll> mo->oo 

A similar argument shows that limsup n _ 5 . 00 |o"3(n)| < e/3. 
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Combining the three bounds yields 

limsup | £2(71) | < lim sup <Ti (n) + lim sup 02 (n) + lim sup 03 (n) 



e e e 
-3 + 3 + 3 =£ ' 
establishing consistency when the Gibbs sampler is regular. 

Since the result holds trivially for the function 1q, the result follows. □ 

The coupling region in Theorem 2 was replaced by a mean contraction as- 
sumption Jj 1 j d log(£(u)) du < in Theorem 4. This way we obtain (possibly 

different) coupling type regions B m> i for each i = 1, . . . , n. We remedy this sit- 
uation by letting m depend on n, which in turn requires us to use a stronger 
assumption on the CUD sequence (i>j)j>i, namely, that lim n _s >00 D* d » = 0. 

Acknowledgments. We thank Seth Tribble, Erich Novak, Ilya M. Sobol' 
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